Netflix Data Analysis

Data Preprocessing

(Cleaning data)

1. Data Types

Type

Year Released/Added

Rating

Duration

2. Missing data

Notice:

* Large amount of director and cast are missing
* Country has some data missing
* Small portion of year-added is missing

Questions: How many percentage of NaN input in total inputs of each columns?

Action(s):

1. Drop director, cast, because of large amount missing input that is not needed for data analysis (Might need for recommender system)
2. Drop description because of the complex content to analyze
3. Fill missing country with US
4. Drop all other missing data because of small impacts

Exploratory Data Analysis

(Using processed data to better understand Netflix's content data as well as Movies and TV Shows trend)

1. Understanding of the popularity of movies and TV shows on Netflix in different countries.

Notice:

* About 2/3 of Netflix's content is movies and 1/3 of it is TV Shows

Question: How are the contents distributed in different countries?

Notice:

* Some are combination of many countries

Notice:

* US accounts for more than 50% in top 10
* US, India, and UK contribute about 75% in top 10

Question: How about each type of contents in top 10 countries?

(ignore the combination of many countries)

Notice:

* Number of movies is about 2 times number of TV shows in US, Canada, France, Spain, Germany, and Mexico
* Number of movies is about 8 times number of TV shows in India and approximately the same in UK
* It reverses in Japan and South Korea: Number of movies is about 1/3-1/2 number of TV shows
* Need to consider the lists of many countries

2. Exploring Netflix's focus on contents in recent years

Action(s): How many additional items were added to each content each year?

1. nf_movie --> call df as movie
2. nf_tv --> call df as tv

Notice:

* The growth in content started from 2013
* The growth in number of movies is much higher than that of TV shows on Netflix --> Netflix is focusing on Movies
* More than 1200 new movies were added in both 2018 and 2019
* Decreasing is seen in 2020 and after that

Question: Does Netflix add the contents right after the contents was released, only take data after 2013?

(Check if there is relationship between year_added and year_released)

Notice:

* The highest increase in producing movies is in 2017, and in 2020 for TV shows
* It looks like the trend of contents released and added are similar --> Netflix added new contents

3. Understanding what content is available for different target audience (kids, teenagers, adults)

Notice:

* Most of shows are for teens and adults, small portions are for kids.

Question: How about each type of contents?

Notice:

* Both pies show largest amount of content for Adults
* More of TV shows are added for kids than movies.

4. Find correlation between target audience and duration of each kinds of contents

Notice:

Notice:

* Most of movies are in range from 80 min to 2 hours. These movies are mostly for adults.
* Longer movies are made for teenagers and kids.

Action:

1. Do the same to see the pattern in TV Shows duration

Notice:

* Most of TV shows have 1 season. The amount decreases in longer seasons (from 8 to 17).
* The distribution of length of TV shows is approximatly the same for teens and kids.
* Short TV shows for adults take a large amount ammong 3 groups.

5. Exploring genre of different types of movies and TV shows

Notice:

* There is negative relationship between drama and documentary.
* Many Sci-Fi & Fantasy movies are in Action & Adventure
* Positive relationship between Horror Movies and Thrillers

Notice:

* Negative relationship between Kids' TV and International TV shows, but good amount of International TV shows are Romatic TV shows or TV Dramas
* There are many Documentaries for Science and Nature TV.
* There's positive correlation between TV Horror and TV Mysteries

Notice:

* Most common movies are in Dramas and International Movies, Documentaries, Stand-Up Comedy
* Most of TV shows are Kids' TV, International TV Shows, and TV Dramas